76 research outputs found

    Using multiclass classification algorithms to improve text categorization tool:NLoN

    Get PDF
    Abstract. Natural language processing (NLP) and machine learning techniques have been widely utilized in the mining software repositories (MSR) field in recent years. Separating natural language from source code is a pre-processing step that is needed in both NLP and the MSR domain for better data quality. This paper presents the design and implementation of a multi-class classification approach that is based on the existing open-source R package Natural Language or Not (NLoN). This article also reviews the existing literature on MSR and NLP. The review classified the information sources and approaches of MSR in detail, and also focused on the text representation and classification tasks of NLP. In addition, the design and implementation methods of the original paper are briefly introduced. Regarding the research methodology, since the research goal is technology-oriented, i.e., to improve the design and implementation of existing technologies, this article adopts the design science research methodology and also describes how the methodology was adopted. This research implements an open-source Python library, namely NLoN-PY. This is an open-source library hosted on GitHub, and users can also directly use the tools published to the PyPI library. Since NLoN has achieved comparable performance on two-class classification tasks with the Lasso regression model, this study evaluated other multi-class classification algorithms, i.e., Naive Bayes, k-Nearest Neighbours, and Support Vector Machine. Using 10-fold cross-validation, the expanded classifier achieved AUC performance of 0.901 for the 5-class classification task and the AUC performance of 0.92 for the 2-class task. Although the design of this study did not show a significant performance improvement compared to the original design, the impact of unbalanced data distribution on performance was detected and the category of the classification problem was also refined in the process. These findings on the multi-class classification design can provide a research foundation or direction for future research

    ATOMS : ALMA Three-millimeter Observations of Massive Star-forming regions - IX. A pilot study towards IRDC G034.43+00.24 on multi-scale structures and gas kinematics

    Get PDF
    We present a comprehensive study of the gas kinematics associated with density structures at different spatial scales in the filamentary infrared dark cloud, G034.43+00.24 (G34). This study makes use of the (HCO+)-C-13 (1-0) molecular line data from the ALMA Three-millimeter Observations of Massive Star-forming regions (ATOMS) survey, which has spatial and velocity resolution of similar to 0.04 pc and 0.2 km s(-1), respectively. Several tens of dendrogram structures have been extracted in the position-position-velocity space of (HCO+)-C-13, which include 21 small-scale leaves and 20 larger-scale branches. Overall, their gas motions are supersonic but they exhibit the interesting behaviour where leaves tend to be less dynamically supersonic than the branches. For the larger scale, branch structures, the observed velocity-size relation (i.e. velocity variation/dispersion versus size) are seen to follow the Larson scaling exponent while the smaller-scale, leaf structures show a systematic deviation and display a steeper slope. We argue that the origin of the observed kinematics of the branch structures is likely to be a combination of turbulence and gravity-driven ordered gas flows. In comparison, gravity-driven chaotic gas motion is likely at the level of small-scale leaf structures. The results presented in our previous paper and this current follow-up study suggest that the main driving mechanism for mass accretion/inflow observed in G34 varies at different spatial scales. We therefore conclude that a scale-dependent combined effect of turbulence and gravity is essential to explain the star-formation processes in G34.Peer reviewe

    Role of hydrodynamic factors in controlling the formation and location of unconformity-related uranium deposits: insights from reactive-flow modeling

    Get PDF
    The role of hydrodynamic factors in controlling the formation and location of unconformity-related uranium (URU) deposits in sedimentary basins during tectonically quiet periods is investigated. A number of reactive-flow modeling experiments at the deposit scale were carried out by assigning different dip angles and directions to a fault and various permeabilities to hydrostratigraphic units). The results show that the fault dip angle and direction, and permeability of the hydrostratigraphic units govern the convection pattern, temperature distribution, and uranium mineralization. Avertical fault results in uranium mineralization at the bottom of the fault within the basement, while a dipping fault leads to precipitation of uraninite below the unconformity either away from or along the plane of the fault, depending on the fault permeability. A more permeable fault causes uraninite precipitates along the fault plane,whereas a less permeable one gives rise to the precipitation of uraninite away from it. No economic ore mineralization can form when either very low or very high permeabilities are assigned to the sandstone or basement suggesting that these units seem to have an optimal window of permeability for the formation of uranium deposits. Physicochemical parameters also exert an additional control in both the location and grade of URU deposits. These results indicate that the difference in size and grade of different URU deposits may result from variation in fluid flow pattern and physicochemical conditions, caused by the change in structural features and hydraulic properties of the stratigraphic units involved

    ATOMS : ALMA three-millimeter observations of massive star-forming regions - III. Catalogues of candidate hot molecular cores and hyper/ultra compact H II regions

    Get PDF
    A correction has been published: Monthly Notices of the Royal Astronomical Society, Volume 511, Issue 1, March 2022, Pages 501–505, https://doi.org/10.1093/mnras/stac039We have identified 453 compact dense cores in 3mm continuum emission maps in the ALMA Three-millimetre Observations of Massive Star-forming regions survey, and compiled three catalogues of high-mass star-forming cores. One catalogue, referred to as hyper/ultra compact (H/UC)-HII catalogue, includes 89 cores that enshroud H/UC HII regions as characterized by associated compact H40 alpha emission. A second catalogue, referred to as pure s-cHMC, includes 32 candidate hot molecular cores (HMCs) showing rich spectra (N >= 20 lines) of complex organic molecules (COMs) and not associated with H/UC-HII regions. The third catalogue, referred to as pure w-cHMC, includes 58 candidate HMCs with relatively low levels of COM richness and not associated with H/UC-Hii regions. These three catalogues of dense cores provide an important foundation for future studies of the early stages of high-mass star formation across the Milky Way. We also find that nearly half of H/UC-HII cores are candidate HMCs. From the number counts of COM-containing and H/UC-HII cores, we suggest that the duration of high-mass protostellar cores showing chemically rich features is at least comparable to the lifetime of H/UC-HII regions. For cores in the H/UC-HII catalogue, the width of the H40 alpha line increases as the core size decreases, suggesting that the non-thermal dynamical and/or pressure line-broadening mechanisms dominate on the smaller scales of the H/UC-HII cores.Peer reviewe

    Two Antarctic penguin genomes reveal insights into their evolutionary history and molecular changes related to the Antarctic environment. GigaScience

    Get PDF
    Abstract Background: Penguins are flightless aquatic birds widely distributed in the Southern Hemisphere. The distinctive morphological and physiological features of penguins allow them to live an aquatic life, and some of them have successfully adapted to the hostile environments in Antarctica. To study the phylogenetic and population history of penguins and the molecular basis of their adaptations to Antarctica, we sequenced the genomes of the two Antarctic dwelling penguin species, the Adélie penguin [Pygoscelis adeliae] and emperor penguin [Aptenodytes forsteri]. Results: Phylogenetic dating suggests that early penguins arose~60 million years ago, coinciding with a period of global warming. Analysis of effective population sizes reveals that the two penguin species experienced population expansions from~1 million years ago to~100 thousand years ago, but responded differently to the climatic cooling of the last glacial period. Comparative genomic analyses with other available avian genomes identified molecular changes in genes related to epidermal structure, phototransduction, lipid metabolism, and forelimb morphology. Conclusions: Our sequencing and initial analyses of the first two penguin genomes provide insights into the timing of penguin origin, fluctuations in effective population sizes of the two penguin species over the past 10 million years, and the potential associations between these biological patterns and global climate change. The molecular changes compared with other avian genomes reflect both shared and diverse adaptations of the two penguin species to the Antarctic environment

    iTRAQ-Based Quantitative Proteomic Analysis of Tear Fluid in a Rat Penetrating Keratoplasty Model With Acute Corneal Allograft Rejection

    No full text
    . iTRAQ-based quantitative proteomic analysis of tear fluid in a rat penetrating keratoplasty model with acute corneal allograft rejection. Invest Ophthalmol Vis Sci. 2015;56:4117-4124. DOI:10.1167/ iovs.14-16207 PURPOSE. This study aimed to develop a greater understanding of the mechanisms underlying acute corneal allograft rejection by identifying differentially expressed tear proteins at defined stages and discovering potentially important proteins involved in the process. METHODS. The isobaric tags for relative and absolute quantitation (iTRAQ)-two-dimensional liquid chromatography-tandem mass spectrometry (2DLC-MS/MS) technique was used to identify tear proteins showing significant alterations in a rat penetrating keratoplasty model at different time points. Bioinformatics technology was applied to analyze the significant proteins, and a potential protein was verified by Western blotting. RESULTS. A total of 269 proteins were quantified, and 118 proteins were considered to be significantly altered by at least 2.0-or 0.5-fold. For gene ontology annotations, the top enrichments were neurological disease, free radical scavenging, cell death and survival, and cell movement. For pathway analyses, the top enrichments were LXR/RXR activation, acute phase response signaling, clathrin-mediated endocytosis signaling, and coagulation system. Coronin-1A was verified as a potential protein involved in the early stage of acute corneal allograft rejection. CONCLUSIONS. This study first demonstrates that tear proteomics is a powerful tool for better understanding of the mechanisms underlying acute corneal rejection, and that coronin-1A in tears might be closely related to allograft rejection
    • …
    corecore